Declarative Statistical Modeling with Datalog
نویسندگان
چکیده
Formalisms for specifying general statistical models, such as probabilistic-programming languages, typically consist of two components: a specification of a stochastic process (the prior), and a specification of observations that restrict the probability space to a conditional subspace (the posterior). Use cases of such formalisms include the development of algorithms in machine learning and artificial intelligence. We propose and investigate a declarative framework for specifying statistical models on top of a database, through an appropriate extension of Datalog. By virtue of extending Datalog, our framework offers a natural integration with the database, and has a robust declarative semantics (that is, semantic independence from the algorithmic evaluation of rules, and semantic invariance under logical program transformations). Our proposed Datalog extension provides convenient mechanisms to include common numerical probability functions; in particular, conclusions of rules may contain values drawn from such functions. The semantics of a program is a probability distribution over the possible outcomes of the input database with respect to the program; these possible outcomes are minimal solutions with respect to a related program that involves existentially quantified variables in conclusions. Observations are naturally incorporated by means of integrity constraints over the extensional and intensional relations. We focus on programs that use discrete numerical distributions, but even then the space of possible outcomes may be uncountable (as a solution can be infinite). We define a probability measure over possible outcomes by applying the known concept of cylinder sets to a probabilistic chase procedure. We show that the resulting semantics is robust under different chases. We also identify conditions guaranteeing that all possible outcomes are finite (and then the probability space is discrete). We argue that the framework we propose retains the purely declarative nature of Datalog, and allows for natural specifications of statistical models.
منابع مشابه
PPDL: Probabilistic Programming with Datalog
There has been a substantial recent focus on the concept of probabilistic programming [6] towards its positioning as a prominent paradigm for advancing and facilitating the development of machine-learning applications. A probabilisticprogramming language typically consists of two components: a specification of a stochastic process (the prior), and a specification of observations that restrict t...
متن کاملSemi-Inflationary DATALOG: A declarative database language with procedural features
This paper presents a rule-based database language which extends stratified DATALOG by adding a controlled form of inflationary fixpoint, immersed in a context of classical stratified negation with least fixpoint. The proposed language, called Semi-Inflationary DATALOG (DATALOG for short), smoothly combines the declarative purity of stratified negation with the procedural style of the inflation...
متن کاملDyna: Extending Datalog For Modern AI (full version)
Modern statistical AI systems are quite large and complex; this interferes with research, development, and education. We point out that most of the computation involves database-like queries and updates on complex views of the data. Specifically, recursive queries look up and aggregate relevant or potentially relevant values. If the results of these queries are memoized for reuse, the memos may...
متن کاملOrder in Datalog with Applications to Declarative Output
We propose an extension of Datalog that has “ordered predicates” (lists/arrays of tuples instead of sets of tuples). We previously suggested to specify output of Datalog programs declaratively by defining text pieces with their position. The proposal in the current paper reaches significantly farther by making order a first class citizen in the language. For database application programs, the o...
متن کاملSequence Datalog: Declarative String Manipulation in Databases
We investigate logic-based query languages for sequence databases , that is, databases in which strings of symbols over a xed alphabet can occur. We discuss diierent approaches to querying strings, including Prolog and Datalog with function symbols, and argue that all of them have important limitations. We then present the semantics of Sequence Datalog, a logic for querying sequence databases, ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1412.2221 شماره
صفحات -
تاریخ انتشار 2014